2080.5 - Information Paper: Australian Census Longitudinal Dataset, Methodology and Quality Assessment, 2006-2016 Quality Declaration 
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 20/03/2019   
   Page tools: Print Print Page Print all pages in this productPrint All

3. LINKAGE RESULTS, 2006-11-16, 2006 PANEL

At the completion of the linkage process 756,945 (77%) of the 979,662 records from the 2006 ACLD Panel sample were linked to a 2011 Census record to create the linked 2006-2011 ACLD file with an estimated precision of approximately 95%, or a false link rate of approximately 5%.

These record pairs were then linked to the 2016 Census via the 2011 Census record in each pair (any 2006 Panel record which had not been successfully linked to a 2011 Census record was not given the opportunity to link to the 2016 Census). This achieved 605,618 links (80% of the 2011 records in the 2006 Panel) at an estimated 98.6% precision for a direct 2011-2016 linkage. 62% of links from the original 2006 Panel sample linked to both the 2011 and 2016 Censuses.

All results presented in this section (unless identified in the relevant table) are based on characteristics from the 2006 ACLD Panel sample and have been confidentialised to prevent the identification of individuals.

Table 1 displays the linkage rate for a range of sub-populations.

TABLE 1 - LINKAGE RATES, By Selected Characteristics

2006 Panel
sample
2006-11 Linked
records
2006-11
Linkage rate
2006-11-16
Linked records
2006-11-16
Linkage rate
(no.)
(no.)
(%)
(no.)
(%)


SEX
Male
480 289
364 727
75.9
288 894
60.2
Female
499 376
392 222
78.5
316 724
63.4


AGE GROUP
0-14
194 016
141 559
73.0
114 540
59.0
15-19
66 246
48 707
73.5
33 783
51.0
20-24
66 509
45 593
68.6
32 891
49.5
25-29
62 249
46 279
74.3
36 613
58.8
30-39
140 273
113 887
81.2
95 332
68.0
40-49
142 911
120 932
84.6
103 464
72.4
50-59
126 287
107 379
85.0
92 004
72.9
60-69
86 385
72 041
83.4
60 185
69.7
70-74
31 003
24 253
78.2
18 231
58.8
75 and over
63 781
36 305
56.9
18 583
29.1


INDIGENOUS STATUS

Non-Indigenous
942 253
733 032
77.8
588 535
62.5
Aboriginal
19 697
12 449
63.2
8 765
44.5
Torres Strait Islander
1 451
940
64.8
674
46.5
Both Aboriginal and Torres Strait Islander
838
503
60.0
361
43.1
Not stated
15 421
10 020
65.0
7 283
47.2

STATE/TERRITORY OF USUAL RESIDENCE
New South Wales
323 135
250 070
77.4
199 416
61.7
Victoria
244 097
191 981
78.6
154 283
63.2
Queensland
192 611
144 427
75.0
114 859
59.6
South Australia
75 476
59 386
78.7
48 422
64.2
Western Australia
95 795
73 948
77.2
59 261
61.9
Tasmania
23 781
18 624
78.3
14 815
62.3
Northern Territory
8 464
5 573
65.8
4 057
47.9
Australian Capital Territory
16 188
12 866
79.5
10 453
64.6

REMOTE AREAS
Major Cities
669 274
523 474
78.2
420 691
62.9
Inner Regional
195 385
150 718
77.1
120 316
61.6
Outer Regional
92 397
68 940
74.6
54 283
58.7
Remote
13 988
9 844
70.4
7 537
53.9
Very Remote
6 548
3 905
59.6
2731
41.7
No Usual Address
2 024
0
0.0
0
0.0

Total(a)(b)(c)
979 662
756 945
77.2
605 618
61.8

(a) Data presented in the table have been perturbed. As a result, the sum of individual categories may not align with totals.
(b) Includes Other Territories.
(c) Includes Migratory areas.

The linkage rates for the 2006 ACLD panel were relatively consistent across most sub-populations and were in line with expected results. Compared with the overall linkage rate of 76%, the sub-populations which achieved the highest linkage rates for the 2006-2011 linkage were persons:
  • aged 50 to 59 and 40 to 49 years (85%) and 60 to 69 years (83%);
  • of non-Indigenous origin (78%);
  • who usually lived in the Australian Capital Territory (80%); and
  • who usually lived in major cities (78%) and inner regional areas (77%).

The same sub-populations had the highest linkage rates when linking through the three Census periods:
  • aged 50 to 59 years (73%), 40 to 49 years (72%) and 60 to 69 years (70%);
  • of non-Indigenous origin (63%);
  • who usually lived in the Australian Capital Territory (65%); and
  • who usually lived in major cities (63%) and inner regional areas (62%).

The sub-populations which achieved the lowest 2006-2011 linkage rates were persons:
  • aged 75 years and over (57%) and aged 20 to 24 (69%);
  • of Aboriginal (63%), Torres Strait Islander (65%) or both Aboriginal and Torres Strait Islander origin (60%);
  • who usually lived in the Northern Territory (67%); and
  • who usually lived in very remote areas (60%).

The lowest 2006-2011-2016 linkage rate by sub-population was those aged 75 years and over (29%) while the North Territory (48%) had the lowest linkage rate by state.

Most sub-populations followed a trend in their linkage rates across the three Census periods, although certain sub-populations fell considerably. Persons aged 15 to 19 in 2006 initially linked 74% of records to the 2011 Census, however dropped to 51% when linking to the 2016 Census. This is likely due to the high level of mobility as persons enter the 20 to 29 age range.

Traditionally, the Census Post Enumeration Survey (PES) has shown that the Census has higher rates of undercount for people of Aboriginal and/or Torres Strait Islander origin, those aged between 20 and 29 and for those in the Northern Territory. As expected, the lower ACLD linkage rates broadly aligned with the same groups that experience higher levels of undercount in the 2016 Census. One additional group that had lower linkage rates were persons aged 75 and over at the time of the 2006 Census who, due to age, had an increased risk of death over the ensuing ten years. Further information on Census undercount can be found in Census of Population and Housing: Details of Overcount and Undercount, 2011 (cat. no. 2940.0) and Census of Population and Housing - Details of Undercount, 2016 (cat. no. 2940.0).

Further data cubes demonstrating the linkage rates for various sub-populations are available as an attachment to this Information paper.


3.1 LINKAGE ACCURACY


The following quality measures were calculated for the ACLD and indicate a good level of overall quality:
  1. The linkage rate, being the proportion of the 2006 ACLD Panel records linked to a 2011 Census record and then again to a 2016 Census record, including both true matches and false links.
  2. The estimated proportion of correctly linked records, otherwise referred to as 'linkage precision'.
  3. The consistency of reporting of common information between record pairs.

3.1.1 Linkage Precision

Not all record pairs assigned as links in a data linkage process are a true match, that is, a record pair belonging to the same individual. While the methodology is designed to ensure that the vast majority of links are true, some are actually false, i.e. the records in the link belong to different people rather than the same person. The linkage strategy used for the ACLD was designed to ensure a high level of accuracy while also achieving a sufficiently high number of links to enable longitudinal research. Accordingly, the strategy was restrictive and conservative.

One of the key measures of linkage quality is the proportion of links in the dataset that are false. The number of false links is able to be estimated through the use of methods such as clerically reviewing a sample of links, or by using modelling techniques. Once an estimate of the number of false links is obtained, a 'precision' can be calculated. The precision is an estimate of the proportion of links that are matches (i.e. belonging to the same entity).
Once the precision of the dataset is estimated, the false link rate is easily calculated.



With clerical review unavailable for the re-link of the 2006 panel, the model designed by Chipperfield et al (2018) known as the Feasibility Calculator (FC) was used as the primary method of calculating precision and setting a cut-off for the 2006-2011 re-link for the 2006 Panel sample. The FC uses the theory developed by Fellegi & Sunter (1969) to conduct a record linkage simulation multiple times in order to estimate precision. The FC then compiles the results of these simulated linkages to calculate the lowest linkage weight at the desired level of precision in each probabilistic linkage pass. These results can then be used to inform a single cut-off point for probabilistic linkage results. Due to the unavailability of name information the ability to distinguish a unique link becomes more difficult, so to ensure a high quality linkage while maintaining a high linkage rate it was decided to set the desired estimated cumulative precision at 95%, or an estimated false link rate of approximately 5%. This method achieved a 77.2% linkage rate when linking the 2006 Panel to 2011 Census records.

Precision estimation for the 2011-2016 linkage of the ACLD involved conducting clerical review on a stratified random sample of links. Potential links were stratified by their link weight value, with a minimum of 5% of links sampled from each individual link weight value (after rounding down to the nearest integer). The results of the clerical review were used to calculate precision estimates for links grouped by pass and rounded link weight value, which were then applied to the entire set of linkage results. This provided an estimate of precision for each individual link, which can be referred to as 'marginal precision', and is the likelihood of a single link being 'true' (i.e. the records belonging to the same person). Using the marginal precision, the 'cumulative precision' of the final set of one-to-one links could be estimated, i.e. the overall precision of the linked dataset.

After producing both marginal and cumulative precision estimates, a cut-off point was selected. This cut-off is intended to optimise both the number of links and cumulative precision of the links retained above the cut-off point, while at the same time maintaining a high level of marginal precision for every individual link above the cut-off. The marginal precision estimates were used to select the cut-off, with all links with a marginal precision of at least 81% being retained. This resulted in a final file of 605,626 links once the cut-off was applied, with an estimated cumulative precision of 98.6%, or a false link rate of 1.4%, for these links.

Clerical review relies upon judgment by a well-trained individual, therefore, while efforts are taken to minimise the risk, it is possible for a link to be incorrectly assigned as a match or non-match. The method for measuring precision developed by Chipperfield et al (2018) was used to provide an independent model-based estimate of the precision. While the clerical estimate of cumulative precision for the 2011-2016 linkage was 98.6%, the model-based approach estimated the precision to be over 99%. The precision as estimated by the clerical review process was retained as the more conservative estimate.

Table 2 provides a summary of the precision estimate and false link rate by the pass where each link was selected (estimated via clerical review) for the 2011-2016 linkage.

TABLE 2 - PRECISION ESTIMATES AND FALSE LINK RATES, By Pass Number, 2011-2016 linkage (2006 Panel)

Pass Number
(a)
Proportion of Overall
Links
Estimated True Link Rate
/ Precision Estimate
Estimate False Link
Rate
(no.)
(%)
(%)
(%)

1
75.5
100
0
2
15.1
94.4
5.6
3
1.2
96.4
3.6
4
1.1
95.3
4.7
5
0.3
92.9
7.1
6
0.6
99.8
0.2
7
1.3
96.2
3.8
8
0.9
93.8
6.2
9
4.0
95.9
4.1

Total(b)
100
98.6
1.4

(a) Pass number 1 refers to the deterministic linkage.
(b) Data presented in the table have not been perturbed.

The conservative and restrictive nature of the blocking and linking strategy, accompanied by quality controls that were implemented during clerical review and the desired level of linkage precision, helped to minimise the estimated number of false links throughout the linkage process.

Over three quarters of all links were achieved in the first pass of the project (78.4% for 2006-2011 and 75.5% for 2011-2016), which used a deterministic linking methodology to identify and filter matches. This pass implemented tight geographic and demographic restrictions to maximise the number of high quality links assigned and to limit the amount of alternative comparisons required. Using this approach, links were only accepted if a single unique record pair was identified.

3.1.2 Consistency of Common Information on Record Pairs

In data linkage projects, geographic boundaries function as blocking variables that restrict the search for links to records which agree on the defined geography. They are also used as linking variables, and when combined with other linking fields (such as hashed name (2011-2016 only), age, sex and date of birth), they provide a high level of uniqueness, and reduce the likelihood of linking to an incorrect record.

Tables 3a and 3b display the number of records that had consistent information on key linking variables, grouped by levels of geography.

TABLE 3a - CONSISTENCY OF LINKED RECORDS, By Geography And Selected Linking Fields, 2006 Panel, 2006-2011 linkage

Consistency of key linkage fields(a)(b)(c)
(no.)
(%)

MESH BLOCK
Age exact, Mesh Block, Sex, DOB Day and Month agree
594,727
79.2
Age exact, Mesh Block, Sex agree
34,784
4.6
Age +/- 1 years, Mesh Block, Sex agree
20,190
2.7

STATISTICAL AREA LEVEL 2
Age exact, SA2, Sex, DOB Day and Month agree
62,481
8.4
Age +/- 1 years, SA2, Sex, DOB Day and Month agree
412
0.1
Age +/- 1 years, SA2, Sex agree
16,513
2.2

STATISTICAL AREA LEVEL 4
Age exact, SA4, Sex, DOB Day and Month agree
21,722
2.9

Total records included
751,199
99.2

Total records linked
756,945
100

    (a) Only includes records that agree on all key linking fields.
    (b) Categories are mutually exclusive. Records that agree in each category are excluded from subsequent categories.
    (c) Percentages may not add up to the total due to rounding.

TABLE 3b - CONSISTENCY OF LINKED RECORDS, By Geography And Selected Linking Fields, 2006 Panel, 2011-2016 linkage

Consistency of key linkage fields(a)(b)(c)
(no.)
(%)

MESH BLOCK
First name hash, Surname hash, Age exact, Mesh Block, Sex, DOB Day and Month agree
371,378
61.3
First name hash, Surname hash, Age exact, Mesh Block, Sex agree
93,577
15.5
Age exact, Mesh Block, Sex, DOB Day and Month agree
66,203
10.9
Age exact, Mesh Block, Sex agree
4,214
0.7
Age +/- 1 years, Mesh Block, Sex agree
15,853
2.6

STATISTICAL AREA LEVEL 2
First name hash, Surname hash, Age +/- 1 years, SA2, Sex, DOB Day and Month agree
17,215
2.8
Age exact, Mesh Block, Sex, DOB Day and Month agree
4,589
0.8
Age +/- 1 years, SA2, Sex agree
3,234
0.5

STATISTICAL AREA LEVEL 4
First name hash, Surname hash, Age +/- 1 years, SA4, Sex, DOB Day and Month agree
17,526
2.9
Age +/- 1 years, SA4, Sex, DOB Day and Month agree
4,149
0.7
Total records included
597,938
98.7

Total records linked
605,618
100

    (a) Only includes records that agree on all key linking fields.
    (b) Categories are mutually exclusive. Records that agree in each category are excluded from subsequent categories.
    (c) Percentages may not add up to the total due to rounding.

Approximately 99% of all records that were matched in the ACLD linkage process agreed on small to medium levels of geographic area combined with other key linking fields, such as first name and surname hash codes (only for the 2011-2016 linkage), age, sex and date of birth. Analysis of consistency from the 2006 Census to the 2016 Census was not undertaken due to complexities in comparing geography. While the number of consistent fields can give a strong indication of likely linkage quality, other factors should be taken into account, for example, the expected number of people in a geographic area that are likely to share a characteristic by chance. A tolerance of plus or minus one year was used at certain parts of the linkage process to cater for persons who may have understated their age in one Census and/or overstated it in another Census or vice versa.

By contrast, record pairs may have inconsistent information and yet be a match. Inconsistent information may be recorded for the same person in different Censuses due to a range of factors, including:
  • transcription errors in the Census, where the wrong category is selected or the information is transposed, such as the day the person was born being reported in the month field instead of in the day field;
  • data capture errors, where the Census form is scanned using Optical Character Recognition (OCR) software and certain characters may be mis-classified, such as a 1 captured as a 7 or a 3 as an 8;
  • reporting errors, where information is given for the wrong member of the household (e.g. person 1's information is reported for person 3) or where the person completing the Census form for a household guesses or estimates information about a fellow household member;
  • information that was not stated by the respondent and has been imputed as part of Census processing (such as age or sex), while set to missing for linking, the imputed values are included in the analytical dataset;
  • census form questions are interpreted differently at each Census; or
  • questions are coded differently for each Census.

Of particular note is inconsistency due to non-reporting of name and date of birth in the 2011 Census and the 2016 Census. Respondents are becoming less likely to provide their date of birth, with 90% reporting in the 2011 Census decreasing to 81% reported date of birth in the 2016 Census. Further, just over one per cent of Australians had a missing, or blank, response for first name or surname in the 2016 Census. There appeared to be a relationship between having a missing response for both first name and surname and non-response on other variables. Of the people who did not report first name and surname, approximately half did not report at least one of sex, age, or Indigenous status. The vast majority of missing responses came from paper forms, with the overall level of missing responses in the 2016 Census remaining low.

3.1.3 Comparison with the original 2006 Panel linkage

Table 4 compares the final results of the original 2006 Panel linkage with the revised linkage.

TABLE 4 – COMPARISON OF LINKAGE RESULTS, 2006-2011

Original linkage (2006-11)
Re-link (2006-11)

Linked records (no.)
800,758
756,945
Linkage rate (%)
82.0
77.2
Precision
Approx. 90-95%
95%



While the linkage rate has reduced there is greater confidence in the precision of the links that have been achieved in the re-link of the 2006 Panel due to the enhanced linking methodology implemented for the linkage. Over the entire panel 81.6% of records always achieved the same result (same link identified, or not linked). The changes in links can be viewed in Table 5.

TABLE 5 – STATUS OF LINKS, 2006-2011

2006 Panel records
(no.)
2006 Panel records
(%)

Same link
670,073
68.4
Different link
37,206
3.8
New link
49,666
5.1
Lost link
93,479
9.5
Never linked
129,238
13.2

(a) Data presented in the table have not been perturbed.


3.2 CHARACTERISTICS OF LINKED AND UNLINKED 2006 ACLD PANEL SAMPLE

The random sample selected from the 2006 Census for the 2006 ACLD Panel was designed to be representative of the Australian population by age, sex and jurisdiction as well as other characteristics such as Indigenous status and country of birth.

Table 6 shows the distribution of key populations across the 2006 Census, the 2006 ACLD Panel sample and the 2006-2011 linked results.

TABLE 6 - SELECTED CHARACTERISTICS, By 2006 Census, 2006 ACLD Panel Sample, 2006-2011 ACLD Linked Results

2006 Census
2006 Panel Sample
Linked Results
(2006-2011)
Weighted Linked Results (a)
(2006-2011)
(no.)
(%)
(no.)
(%)
(no.)
(%)
(no.)
(%)

SEX
Male
9 896 500
49.3
480 285
49.0
364 727
48.2
9 085 806
49.3
Female
10 165 146
50.7
499 372
51.0
392 222
51.8
9 326 779
50.7

STATE/TERRITORY OF USUAL RESIDENCE
New South Wales
6 549 174
32.6
323 136
33.0
250 070
25.5
6 037 632
32.8
Victoria
4 932 422
24.6
244 095
24.9
191 981
19.6
4 620 771
25.1
Queensland
3 904 531
19.5
192 606
19.7
144 427
14.7
3 611 492
19.6
South Australia
1 514 340
7.5
75 481
7.7
59 386
6.1
1 384 197
7.5
Western Australia
1 959 088
9.8
95 795
9.8
73 948
7.5
1 843 423
10.0
Tasmania
476 481
2.4
23 787
2.4
18 624
1.9
428 564
2.3
Northern Territory
192 899
1.0
8 469
0.9
5 573
0.6
189 052
1.0
Australian Capital Territory
324 034
1.6
16 186
1.7
12 866
1.3
295 676
1.6

AGE GROUP
0-9
2 579 496
12.9
127 331
13.0
92 684
12.2
2 586 620
14.0
10-19
2 756 102
13.7
132 937
13.6
97 586
12.9
2 547 772
13.8
20-29
2 684 371
13.4
128 760
13.1
91 875
12.1
2 650 903
14.4
30-39
2 893 058
14.4
140 271
14.3
113 887
15.0
2 891 972
15.7
40-49
2 942 353
14.7
142 911
14.6
120 932
16.0
2 875 902
15.6
50-59
2 574 589
12.8
126 285
12.9
107 379
14.2
2 412 469
13.1
60-69
1 733 297
8.6
86 385
8.8
72 041
9.5
1 514 365
8.2
70-79
1 168 675
5.8
58 277
5.9
43 486
5.7
755 872
4.1
80 and over
729 705
3.6
36 502
3.7
17 071
2.3
176 533
1.0

INDIGENOUS STATUS
Non-Indigenous
18 266 814
91.1
942 253
96.2
733 032
96.8
17 654 813
95.9
Aboriginal and/or Torres Strait Islander
455 027
2.3
21 985
2.2
13 892
1.8
524 075
2.8
Aboriginal
407 700
2.0
19 697
2.0
12 449
1.6
466 722
2.5
Torres Strait Islander
29 515
0.1
1 449
0.1
940
0.1
38 103
0.2
Both Aboriginal and Torres Strait Islander
17 812
0.1
839
0.1
503
0.1
19 250
0.1
Not stated
1 133 449
5.6
15 416
1.6
10 020
1.3
233 603
1.3

Total (b)(c)(d)
20 061 646
100
979 662
100
756 945
100
18 412 584
100

    (a) For more information on weighting see chapter 3.4.
    (b) Data presented in the table have been perturbed. As a result the sum of individual categories may not align with totals.
    (c) Includes Other Territories.
    (d) Includes Migratory areas.


The distribution of the ACLD file by sub-population was generally well aligned with both the 2006 Panel sample and the entire 2006 Census. When looking at the relative difference between these proportions, however, some differences are more clearly observed.

Compared with the entire 2006 Census, the linked 2006 ACLD Panel contains relatively more records for people aged 40-49 and 50-59 years, and to a lesser extent those aged 60-69 years. By contrast, the linked 2006 Panel contains relatively fewer records for people aged 20-29 years and 80 years and over. This is applicable for both the 2006-2011 and 2006-2011-2016 linkages, with the latter having increased proportional differences when compared to the 2006 Census.

In general, the distribution of weighted counts for the linked ACLD file is close to that of the entire 2006 Census, but it should be noted that the weighting process is not designed to produce counts corresponding to the population in 2006. Rather, the weighted population is that of people who were in scope of both the 2006 and 2011 Censuses for the 2006-2011 linkage and of people who were in scope of the 2006, 2011 and 2016 Censuses for the 2006-2011-2016 linkage (see Section 3.4 Weighting). Thus, for example, the lower proportion of older people in the linked file, even after weighting, reflects the impact on the 2006 Panel sample of deaths that occurred between 2006, 2011 and 2016.

Further data cubes demonstrating more detailed population distributions are provided as an attachment to this Information paper.


3.3 REASONS FOR UNLINKED RECORDS

There are two main reasons why records from the 2006 Panel sample were not linked to a 2011 Census and/or 2016 Census record:
  • records belonging to the same individual were present in the 2006 Panel sample and the 2011 and/or 2016 Census but these records failed to be linked because they contained missing or inconsistent information; or
  • there was no 2011 or 2016 Census record corresponding to the 2006 Panel sample record because the person was not counted in the 2011 and/or 2016 Census.

3.3.1 Missing and/or inconsistent information

In these cases, the true match was present in the pool of all record pairs but it was not identified because there was a high level of inconsistency between information on each Census, or key linking fields were missing altogether. The reasons for the match being missed can be categorised into the following groups:
  • the missing or inconsistent information did not allow the record pair to be compared in the same blocking categories and could not be linked;
  • the record pair did not contain enough unique common information to distinguish the match from other potential record pairs;
  • the record pair was linked, but was attributed a low link weight as it contained a lot of missing or inconsistent information and was positioned below the cut-off identified in sample clerical review or modelling via the Feasibility Calculator; or
  • the record pair was subjected to clerical review, but the high level of inconsistency did not enable it to be deemed a true link.

Accurate address coding was crucial in narrowing the search and differentiating between true and false links. It was a particular challenge for persons who had moved, since linkage was then heavily dependent on accurate recall and detailed information supplied in the 2011 and 2016 Censuses about the person’s address five years previous. Processing for the 2011 and 2016 Census involved coding for address five years ago to a fine level of geography, ideally Mesh Block. This was not always possible, due to insufficient and/or incorrect address information being supplied for some persons, potentially due to recall issues.

3.3.2 No 2011 or 2016 Census Record

A person included in the 2006 Panel sample may have had no equivalent 2011 Census and/or 2016 Census record because they were no longer in scope for the Census due to migration from Australia, or death between 2006 and 2016, or they may simply have been missed in the Census. If a 2006 Panel record was not linked to the 2011 Census then that person did not have the opportunity to be linked to the 2016 Census, due to these only being linked via the linked 2011 Census records.

According to mortality data compiled by the ABS from data supplied by the Registrars of Births, Deaths and Marriages, approximately 700,000 people died in Australia between 2006 and 2011 and approximately 913,000 between 2011 and 2016. If 5% of these people were selected in the 2006 Panel sample, then it could be estimated that up to 35,000 people could not have been linked due to death between 2006 and 2011. Similarly, migration data estimates that just over one million people left Australia as permanent emigrants between 2006 and 2011, while just over 1.4 million people left between 2011 and 2016, potentially resulting in up to 50,000 people from the 2006 Panel sample being unlikely to have a corresponding 2011 Census record. For more information please refer to the relevant releases of Migration, Australia (cat. no. 3412.0) and Deaths, Australia (cat. no. 3302.0).

Due to the size and complexity of the Census, it is inevitable that some people are missed and some are counted more than once. It is for this reason that the Census Post Enumeration Survey (PES) is run shortly after each Census, to provide an independent measure of Census coverage. The PES determines how many people should have been counted in the Census, how many were missed (undercount), and how many were counted more than once (overcount). It also provides information on the characteristics of those in the population who have been under- or overcounted.

The net undercount rate was 1.7% for the 2011 Census and 1% for the 2016 Census, with higher rates for Aboriginal and Torres Strait Islander people than for the non-Indigenous population. Thus approximately 15,000 people from the 2006 Panel sample could have been missed in the 2011 Census. This estimate is a starting point only and does not take into account the likelihood of people being missed in successive Censuses. For more information please refer to Census of Population and Housing - Details of Undercount, 2011 (cat. no. 2940.0) for the 2011 Census and Census of Population and Housing: Details of Overcount and Undercount, 2016 (cat. no. 2940.0) for the 2016 Census.

When taking into account all of these factors, it is estimated that nearly half of the unlinked 2006 Panel sample (100,000 out of the 222,717 unlinked records) would not have a corresponding record in the 2011 Census. This would indicate that the initial linkage rate of 77% for the 2006-2011 linkage could be representative of up to 89% of the population that actually had an opportunity to be linked.

The proportion of links achieved in the 2011-2016 linkage of the 2006 Panel is approximately 3.5% of the 2011 Census population. Using the factors outlined above, approximately 913,000 people died between 2011 and 2016, therefore it could be estimated that almost 32,000 people could not have been linked due to death between 2011 and 2016. Similarly, migration data estimates that just over 1.4 million people left Australia as permanent emigrants between 2011 and 2016, potentially resulting in approximately 49,000 people being unlikely to have a corresponding 2016 Census record due to migration between 2011 and 2016. For more information please refer to the relevant releases of Migration, Australia (cat. no. 3412.0) and Deaths, Australia (cat. no. 3302.0). Taking into account the net undercount rate of 1% for the 2016 Census, it is estimated that almost 8,000 persons may have been missed and therefore missing a corresponding 2016 Census record.

Therefore it is estimated that almost 59% of the unlinked 2011 Census records from the 2006 Panel sample (89,000 of 151,000 unlinked records) would not have had a corresponding records in the 2016 Census. This would indicate that the initial linkage rate of 80% could be representative of almost 92% of the population that actually had an opportunity to be linked.

Thus it is estimated that 50% of the unlinked records from the 2006 Panel (189,000 of 374,000 unlinked records) would not have a corresponding record in the 2016 Census, however this estimate does not take into account persons that were out of scope or missed for the 2011 Census but may have come back into scope for the 2016 Census. This would indicate that the 62% linkage rate for the 2006 Panel that linked to both the 2011 Census and the 2016 Census could be representative of approximately 81% of the population that actually had an opportunity to be linked from 2006 through to 2016.


3.4 WEIGHTING

Weighting is the process of adjusting a sample to infer results for the relevant population. To do this, a 'weight' is allocated to each sample unit - in this case, persons. The weight can be considered an indication of how many people in the relevant population are represented by each person in the sample. In the case of the ACLD, populations are defined in terms of a set of Censuses. The 2006-2011 longitudinal population is defined as those people in scope of the 2006 and 2011 Censuses while the 2006-2011-2016 longitudinal population is those people in scope of the 2006, 2011 and 2016 Censuses. The longitudinal weights were created for linked records in the ACLD to enable longitudinal population estimates to be produced. Cross-sectional population estimates for 2006, 2011 and 2016 are available from each Census.

The 2006 Panel of the ACLD is a random sample of 5% of the 2006 Census. As such, each person in the sample should represent about 20 people in the 2006 Census population. Between Censuses, however, the in scope population changes as people die or move overseas. In addition, Census net undercount and data quality can affect the capacity to link equivalent records across waves. The weights of the linked records on the ACLD were calibrated to the estimated population that was in scope of the 2006 and 2011 Censuses and then again for the 2006, 2011 and 2016 Censuses. The weights were based on four components: the design weight, undercoverage adjustment, missed link adjustment and population benchmarking.

Two distinct weights were designed to allow for analysis of either the 2006-2011 or 2006-2011-2016 longitudinal population. Unique weights have not been designed for the 2011-2016 longitudinal population that have been linked within the 2006 Panel. It is advised that analysis of this particular population should be undertaken using the 2011 Panel, which was designed to be representative of the 2011 Census population.

The mean final weights for the linked records is 25.0 for females and 26.6 for males in the 2006-2011 longitudinal population and 29.4 for females and 31.5 for males in the 2006-2011-2016 longitudinal population. The weights range between 16.05 and 176.9 for 2006-2011 and between 15.9 and 341.3 for 2006-2011-2016. The mean weight was higher for Aboriginal and Torres Strait Islander persons and for people in the Northern Territory.

The 2006-2011 and 2006-2011-2016 longitudinal population benchmarks are based on the 2011 and 2016 Estimated Resident Population (ERP), which is adjusted by the estimated probability a person belongs to the longitudinal population. This probability is formed using the Census reported address five year ago variable from the 2011 or 2016 Census. Further information on this approach can be found in the paper Chipperfield, Brown & Watson (2016). See References section for details of this publication.

For more information about weighting please refer to the Appendix.